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Abstract. A Treatise on Probability was published by John Maynard Keynes in 
1921. The Treatise contains a critical assessment of the philosophical foundations 
of probability and of the statistical methodology at the time. We review the aspects 
of the book that are most related with statistics, avoiding uninteresting neophyte's 
forrays into philosophical issues. In particular, we examine the arguments provided 
by Keynes again the Bayesian approach, as well as the sketchy alternative of 
a return to Lexis' theory of analogies he proposes. Our conclusion is that the 
Treatise is a scholarly piece of work looking at past advances rather than producing 
directions for the future. 
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1 Introduction 



A Treatise on Probability is John Maynard Keynes' 
polishing of his 1907 and 1908 Cambridge Fellowship 
dissertation (The Principles of Probability, submitted 
to King's College) into a book after an interruption due 
to the war (for censorship reasons, as Keynes was then 
an advisor to the government). Although the author re- 



vised this dissertation in 1914 (as mentioned by Aldrich 



2008a I and then in 1920 towards a general audience. 



the original potential readers of A Treatise on Probabil- 
ity were therefore mostly local academics, among whom 
his Cambridge colleagues Edgeworth and Yule. Despite 
Keynes' lasting interest in statistics, this is also his most significant publication in this 




field, since his research focus had moved to economics by then (as shown by, e.g., Keynes 
1919). In contrast with the immense influence Keynes exerted and still exerts in this 



latter field, and in agreement with the fact that the original version was an internal dis- 
sertation, the impact of A Treatise on Probability was very limited, for reasons further 
discussed hereafter. 

In this review, we consider the most relevant parts of the book, solely from a sta- 
tistical perspective, avoiding the outdated philosophical debates about the nature of 
probability and of induction that constitute most of the Treatise and do not overlap 
with our personal interests. One must recall that this philosophical part on the foun- 
dations of probability is a common feature in books of the period as shown by the first 
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fifty pages of Jefi^reys ( 1937 ) dedicated to "direct probabilities" . Interestingly, Keynes 



favours a more subjective view of probability as a degree of belief, a la de Finetti ( 1974 ), 



while Jeffreys settles for a mathematical definition that implies there is only one "type" 
of probability. It is also worth reminding the reader at this stage that Andrej Kol- 
mogorov's book laying the axioms of modern probability was only published in |1933[ 
since this explains why the concept of probability was still under debate at the time in 
both mathematical and philosophical circles. 



As in the parallel review of Jeffreys (1937) we undertook in Robert et al. (2009), 



there is no attempt at drawing an history of statistics in this review, which is rather to 
be taken as a reflection of a modern reader upon a piece of work written one century ago. 
For earlier historical aspects on the evolution of inverse probability as a central piece 
of statistical thinking in the 19th Century, we refer the reader to the comprehensive 
coverage in Dale ( 1999 1 who, ironically, stops its range at Karl Pearson, i.e. just before 



Keynes briefly entered the statistical scene. For a broader historical perspective on the 
development of statistics and the state of statistics at the time of the Treatise, |Stigler| 
( 1986 ) undoubtedly remains the essential reference. 



Before engaging upon this review, we point out that the Treatise has been previously 



assessed by Stigler ( 2002 ) , who was similarly critical on the depth of the book, and by 



Aldrichj (2008a), who produced an extensive and scholarly survey on the impact (or 
lack thereof) of Keynes on the philosophical and statistical communities at the time. 
The later includes in particular a detailed study of the reviews written on A Treatise 
on Probability by philosophers and statisticians of the early 20th Century, including 



Ronald Fisher (1923) and Harold Jeffreys (1931). 



2 Contents of the Treatise 

" A definition of probability is not possible, unless it contents us to define 
degrees of the probability-relation by reference to degrees of rational belief. " 
A Treatise on Probability, page 8. 

As clearly stated in the above quote, the proclaimed and ambitious goal of A Trea- 
tise on Probability is to establish a logical basis for probability and of drawing a new 
"constructive" approach for statistical induction. The extremely strong views contained 
of the book, as well as the highly critical reassessments of past and (then) current au- 
thors, like Laplace and Pearson, are reflecting upon the youth of the author and his 
earlier dispute with Karl Pearson on correlation. The extensive coverage of the (statis- 
tical if not probabilistic) literature of the time and the comprehensive — if not always 
insightful — discussion of most theories in competition shows the extent of the scholarly 
expertise of Keynes in statistics. 

The Treatise comprises of 23 chapters, regrouped in five parts: 



I. Fundamental Ideas; 
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II. Fundamental Theorems; 

III. Induction and Analogy; 

IV. Some Philosophical Applications of Probability; 
V. The Foundations of Statistical Inference. 

The first part sets the logical and philosophical grounds for establishing a theory of 



probability, also touching upon the Principle of Indifference treated below in Section 9.1 
The second part is about probability axioms seen from a mathematical logic perspective, 
although the mathematical depth is somewhat limited. This part also contains a chapter 
on Inverse Probability, as discussed in Section [9] The third part is mostly philosophical 
and discusses Humean induction with few connections with statistical inference. Part 
IV is a short metaphysical digression on the meaning of randomness and its impact on 
conduct, completely unrelated with statistical inference. The statistical entries in the 
Treatise are mostly found in Part V, which covers convergence theorems (the "Law of 
Great Numbers" and the "Theorems of Bernoulli, Poisson and TchebychefF' ) , Bayesian 
inference and a call for a return to the "Continental" principles laid by Lexis. As 
explored below, the amount of methodological innovation found in the book is extremely 
limited, in line with Keynes' own acknowledgement that he is "unlikely to get much 
further". 



3 A restricted perspective 



From a statistician's viewpoint, the innovative aspects of the Treatise are quite lim- 
ited in that the statistical discourse remains at a highly rethorical — as opposed to 
methodological — level, drafting in vague terms the direction for prospective followers 
that never materialised. While the Treatise presents both an historical (Dale 19991 



and a philosophical interest, from the perspectives both of Keynes' academic career and 
of the foundations of statistics, there is no statistical advance to be found in it. For 
instance, the Treatise is missing the (then) current developments on a comprehensive 
theory of statistical tests, started with Karl Pearson's and William Gosset's t tests. 



and about to culminate in Fisher (1925). Given the contents of the soon-to-come major 
advances represented by not only Fisher's ( |1925 ) Statistical Methods, but also Jeffreys' 
(1939) and de Finetti's (1937, reprinted as 1974) homonymous Theory of Probability, 
the Treatise does not stand the comparison as it fails to provide even a thorough treat- 
ment of the theory of statistics of the time, if not proposing advances in this domain. 
This lack of innovative material, along with the harsh tone of a critic who had con- 
tributed so little to the field, may explain why Keynes' incursion in probability and 
statistics did not have a lasting impact, since even those most sympathetic to the book 
( | Jeffreys ! [T93T| |Lindley[ [T968| saw no practical nor methodological aspect to draw from 



and praised aspects external to their own field (Aldrich 2008a |. Stigler (2002, pp. 161- 
162) similarly questions the worth of A Treatise on Probability as a mathematical and 
statistical work, with almost sole focus "the binomial world" and he considers the book 
unable to "carry the weight of a serious social scientific investigation". 
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"Statistical tediniques tell us how to 'count the cases'." A Treatise on 
Probability, page 392. 

Keynes spends the major portion of the Treatise decrying a large part of the (then) 
current statistical practice (first and foremost, Bayesiaij^statistics) as well as a majority 
of the past and (then) current statisticians, and in reproducing the arguments of other 
(and more Continental) researchers, like Boole, Lexis, or von Kries. (Again, this is in 
line with our argument that the book is a scholarly and critical memoir rather than 
a innovative manifesto, even though the author aimed at a broader impact on the 
statistics community at the time.) Furthermore, most of Part V deals with observation 
frequencies (hence the quote at the top of this section) and their stabilisation. 

Quite curiously, whe n compared wit h, say, the much 
more modern treatise by Jeffreys ( 193l[ p this book does 
not contain analyses of realistic datasets, except when 
criticising von Bortkiewicz's theory through the Prus- 
sian cavalry horsekick data, which is customarily used 
for introducing Poisson modelling and is available in 



R as the prussian dataset (R Development Core Team 



2006). This is somehow surprising when considering 




the main research field of Keynes, namely Economics, 
where examples of considerable interest abound. In- 
stead, a very small number of (academic) examples like 
the proportion of boys in births is recurrently discussed 
throughout the book. 

To be complete about the statistics contents of A Treatise on Probability, we note 
that Part II on Fundamental Theorems also contains a chapter on the properties of 
various estimators of the mean in connection with the distribution of the observations, 
although Keynes dismisses its importance by stating "It is without philosophical interest 
and should probably be omitted by most readers" (page 186). This chapter actually 
reproduces Keynes' only genuine statistical paper, published in the Journal of the Royal 
Statistical Society in 1911 on the theory of averages (to be discussed in Section [?]). 



^ It is worth pointing out that the denomination of 'Bayesian' appeared much later | |Fienberg[ |2006[ l , 
Keynes resorting to the (then) current denomination of Inverse Probability or, in a more derogatory way, 
"t he LapJa c ian the ory of 'unknown probabihties" (page 372). Referring to the authoritative arguments 
of |Aldrich| | |2008a[ ) , we stress that the papers of Keynes on statistics were definitely Bayesian with 
most of his analysis being based on an uniform prior. He later started to worry about the influence of 
the prior, leading to the Treatise and to its harsh criticism of inverse probability, even though some 
Bayesian arguments remain in use within the book (see footnote [4]| . This is quite in agreement with 
the practice of the day, with statisticians mixing frequency and inverse probability arguments, Pearson 
and Fisher included, even though earlier books like |Bertrand| ( |1889[ l had pointed out the distinction 
bet ween objective and subjective probabilities. 

^Jeffreys] l |l922| | reviewed the Treatise for Nature. His review was quite benevolent, despite most of 
Keynes' perspectives on statistics being foreign to his own. This may explain why Part V was mostly 
bypassed by Jeffreys' review. His comments in Scientific Inference (1931, pp. 222-224) about Keynes' 
refusal to admit that probabilities were numbers, hence are comparable, and in Theory of ProbabiUty 
(1939, p. 25) about Keynes' unwillingness to generalise axioms, more truthfully reflects Jeffreys' global 
opinion about the book. 
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4 The low role of models 

"The knowledge of statistical theory, which is required for this, travels, 
find, quite outside my knowledge." Letter to Pearson, 1915. 



Throughout the book, Keynes holds both probability as a mathematical theory and 
probabilistic models as the basis for statistics in very low regard, considering that un- 
known probability do not exist and that the reproducibility of experiments is almost 
always questionable (as shown by the quote 'Some statistical frequencies are, with nar- 
rower or wider limits, stable. But stable frequencies are not very common, and cannot 
be assumed lightly", page 336), apart from urn models. When adopting this type of 
reasoning, Keynes thus falls into what we call an "ultra-conditioning fallacy", namely 
that, the more covariates one conditions upon, the more different the individuals behave, 
a point of view that goes against supporting statistical practice because there can be 
no frequency stabilisationj^ For instance, Keynes states that "where general statistics 
are available, the numerical probability which might be derived from them is inappli- 
cable because of the presence of additional knowledge with regards to the particular 
case" (page 29) . (He then goes on deriding Gibbon for his use of mortality tables when 
he should have called for a doctor!) This shows the gap between the perspectives of 



Keynes and those of Jeffreys (1931 1939) and de Finetti (1937, reprinted as 1974), the 
later focussing on the exchangeability of events to derive the existence of a common if 
unknown probability distribution. It is also the more surprising given the earlier works 
of Pearson and Edgeworth in the 1880's that developed mathematical statistics towards 
a general theory of inference (see Stigler 1986 Chapters 9 and 10). The above quote. 



given in Stigler (1999) as a request from Keynes to Pearson to help as examiner at 



the University of London, may however explain the reluctance of Keynes to engage in 
deeper mathematics. 

The use of particular sampling distributions (called laws of errors) in the repro- 
duction of his 1911 paper on averages (see Section [t]) is not discussed in a modelling 
perspective but simply to back up the standard types of averages as maximum likelihood 
estimators]^ "The general evidence which justifies our assumption of the particular law 
of errors which we do assume" (page 195) is never discussed further in the Treatise. As- 
sessing the worth of a probability model against a dataset was not an issue for Keynes, 
although Pearson had earlier addressed the problem. 



^Discussing a similar point about Keynes, Stigler (1999, page 48) concludes that, with "tin's stan- 
dar d, it is difficult to conceive of any policy issue that can be investigated statistically" . 

^Aldric h (2008a) argues quite convincingly that these are maximum a posteriori (MAP) estimates 
corresponding to flat priors, making the term most probable more coherent. When stating the problem 
on page 194, Keynes indeed invokes the "theorem of inverse probability" , namely Bayes' theorem on 
the parameter Gnite set. This theorem is also described in the 1911 paper. Again, this shows that part 
of Keynes' reasoning is still grounded within the principles of inverse probability. 
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5 Criticisms of frequentism 

"The frequency theory, therefore, entirely fails to explain or justify the most 
important source of the most usual arguments in the field of probable infer- 
ence." A Treatise on Probability, page 108. 

Given Keynes' reluctance, mentioned above, to accept numerical probabilities, mod- 
els and reproducibility, it is no surprise that only extensive frequency stability is accept- 
able for him: "The 'Law of Large Numbers' is not at all a good name for the principle 
that underlies Statistical Induction. The 'Stability of Statistical Frequencies' would be 
a much better name for it. " (page 336) . He has a strong a priori against the almost sure 
stabilisation of an iid random sequence, especially when considering real data. This is 
historically intriguing given the derivations by Bernoulli, de Moivre, and Laplace of the 
Law of Large Numbers more than a century earlier. 

"Some statistical frequencies are, with narrower or wider limits, stable. But 
stable frequencies are not very common, and cannot be assumed lightly." A 
Treatise on Probability, page 336. 

The criticism of the Central Limit Theorem (CLT), 
called Bernoulli's Theorem in the bookj^ that is found 
in Chapter XXIX is rather curious, in that it confuses 
model probabilities p with probability estimates p' (the 
identical notation being an indicator of this confusion) 
For instance, on page 343, Keynes criticises the use of 
the Central Limit Theorem (CLT) for the Bernoulli dis- 
tribution S(l/2) and a coin tossing experiment as, when 
"heads fall at every one of the first 999 tosses, it becomes 
reasonable to estimate the probability of heads at much more than 1/2". This argument 
is therefore confusing the probability model B{p) with the estimation problem. Keynes' 
inability to recognise the distinction may stem from his reluctance to use unknown 
probabilities such as p. Similarly, on pages 349-350, when considering the proportion of 
male births, an example dating back to Laplace, Keynes states that the probability of 
having n males births in a row is not p", if p is the probability of a single male birth, 
but 

r r + 1 r + 2 r + n — 1 
s s + 1 s + 2 s + n — 1 

if s is the number of births observed so far, and r the number of male births. The later is 
a sequential construction based on individual estimates for each new observation, neither 

^Bernoulli's Theorem is historically the weak Law of Large Numbers but Keynes presents this result 
in conjunction with (a) a description of the binomial B{n,p) distribution and (b) the normal (CLT) 
approximation to the binomial cdf, a result he calls Stirling's theorem. While Edgeworth had a clear 
influence on Keynes in Cambridge, his expansions providing a better approximation than the CLT are 
not mentioned |Hall[ |1992y 

^Jeffreys (1931, p. 224) stresses that "Keynes' postulate might fit the assigned probabihties instead 
of the true probabihties" . 
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a true (predictive) probability nor a genuine plug-in estimate. Most interestingly, under 
a flat prior on p, the predictive (marginal) probability of seeing n male births in a row 
is 



r!(s — r)! 



r!(s-r)! (n + s + 1)! 
(r+l)---(r + n) 



(s + 2)...(s + n + l) 

We thus come to the conclusion that Keynes' solution corresponds to using Haldane's 
(1932) prior, tt{p) — l/(p(l —p), whose impropriety difficulties (Robert 2001) were not 
an issue at the time and even later in Jeffreys ( 1939 ) . 



"It seldom happens that we can apply Bernoulli's theorem to a long series 
of natural events." A Treatise on Probability, page 343. 



That Keynes concludes that Bernoulli's Theorem (a simple version of the CLT in 
the binomial case) does not hold exactly in this setting is clearly inappropriate. When 
considering that "knowledge of the result of one trial is capable of influencing the prob- 
ability at the next", he is confusing the "true" probability with the estimated one. The 
same criticism applies to Keynes' remark that "a knowledge of some members of a pop- 
ulation may give us a clue to the general character of the population in question. " (page 
346), a remark that bears witness to Keynes' skepticism about the relevance of prob- 
abilistic models. From a Bayesian perspective, it appears that Keynes mixes sampling 
distributions with marginal distributions, as in the latent variable example of page 346 
dealing with observations from B{p) when p S {pi, . . . ,Pk}'- the observations become 
dependent when integrating out p. The statement "if we knew the real value of the 
quantity, the different measurements of it would be independent" (page 195) may be 
understood under this light, even though it is a risky extrapolation given both the book 
stance on Bayesian statistics and the lack of evidence Keynes mastered this type of 
mathematical techniques (shown by the quote at the entry of Section [4]). 



6 Keynes' views on statistical inference 

In the continuation of the quote from page 392 given above, the Treatise argues most 
vigorously against mathematical statistics by stating that the purpose of statistics ought 
to be strictly limited to preparing the numerical aspects of our material in an intelligible 
form. Keynes thus separates inference (the usua7 inductive methods) from statistics and 
clearly shows his skepticism about extending statistics beyond a descriptive tool. 

"The statistician is less concerned to discover the precise conditions in which 
a description can be legitimately extended by induction." A Treatise on 
Probability, page 327. 
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The focus of statistical inference as described in Part V is reduced to a probability 
assessment: "In the first type of argument we seek to infer an unknown statistical 
frequency from an a priori probability. In the second type we are engaged on the mvcrsc 
operation, and seek to base the calculation of a probability on an observed statistical 
frequency. In the second type we seek to pass from an observed statistical frequency, 
not merely to the probability of an individual occurrence, but to the probable value 
of other unknown statistical frequencies" (page 331). This is actually rather surprising 
given the overall negative tone of A Treatise about probability theory. 

The first item above is a probabilistic issue and is 
treated as such in Chapter XXIX, which covers both the 
normal and Poisson limit theorems, as well as Cebysev's 
inequality. Further criticisms of "Bernoulli's Theorem" 
found in this chapter are limited to the fact that finding 
independent and identically distributed (i.i.d.) replica- 
tions is a condition that is "seldom fulfilled" (page 342). 
The 1901 proof by Liapounov of the CLT for general in- 
dependent random variables is not mentioned in Keynes' 
book and was presumably unknown to the author. In- 
stead, he refers to Poisson for a series of independent 
random variables with different distributions, warning 
that "it is important not to exaggerate the degree to 
which Poisson's method has extended the application of Bernoulli's results" (page 346). 
Although Cebysev's inequality has had very little impact on statistical practice, ex- 
cept when constructing conservative confidence intervals, Keynes is clearly impressed 
by the result (of which he provides a very convoluted proof on pages 353-355) and 
he concludes rather unfairly since Laplace wrote one century before — that "Laplacian 
mathematics is really obsolete and should be replaced by the very beautiful work which 
we owe to these Russians" (page 355). Chapter XXIX terminates with an interest- 
ing section on simulation experiments aiming at an empirical verification of the CLT, 
although Keynes' conclusion on a very long dice experiment is that, given that the fre- 
quencies do not match up "what theory would predict" (page 363), the dice used in this 
experiment was quite irregular (or maybe worn out by the 20,000 tosses!) 

"I do not believe that there is any direct and simple method by which we can 
make the transition from an observed numerical frequency to a numerical 
measure of probability." A Treatise on Probability, page 367. 

As illustrated by the above quote and discussed in the next section, Keynes does 
not consider Laplace's (i.e. the Bayesian) approach to be logically valid and he similarly 
criticises normal approximations a la Bernoulli, seeing both as "mathematical charla- 
tanery" (page 367)! Even the (maximum likelihood) solution of estimating p with the 
frequency x/n when x ~ B{n,p) does not satisfy him (as being "incapable of a proof, 
page 371). Note that the maximum likelihood estimator is called the most probable 
value throughout the book, in concordance with the current denomination at the time 
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(Hald 1999), without Keynes objecting to its Bayesian flavour. Obviously, given that 
he wrote the main part of the book before the war, he could not have used Fish er's de- 
nomination of maximum likelihood estimation since its introduction dates from Il922n 

The method of least squares is also heavily attacked in Chapter XVII as "surrounded 
by an unnecessary air of mystery" (page 209), while conceding on the next page that it 
exactly corresponds to assuming the normal distribution on observations (a fact that is 
not correct either). Once again, Keynes is missing the recent developments of Pearson 
and Keynes on the estimation of regression coefficients, following the publication in 



1889 of Natural Inheritance by Francis Galton and his discovery of regression ( "one of 
the most attractive triumphs in the history of statistics", according to Stigler, 1999, 
page 186.). Galton is only quoted twice in the Treatise and for marginal reasons, while 
regression does not appear at all. 



7 On the principal averages 

Chapter XVII reproduces Keynes' 1911 paper in the Journal of the Royal Statistical 
Society on the characterisation of the distributions leading to specific standard averages 
as MAPs under a flat prior, i.e. modern MLEs, which means obtaining classes of densi- 
ties for which the MLEs are the arithmetic, the geometric and the harmonic averages, 
and the median, respectively. The earher decision-theoretic justifications of the arith- 
metic mean by Laplace and Gauf5 are derided as depending on "doubtful and arbitrary 
assumptions" (page 206), while the lack of reparameterisation invariance of the arith- 
metic average as MLE is clearly stated (on page 208). This classification of standard 
averages as MLEs is more of a technical exercise than of true methodological relevance, 
because the classification of distributions ( "laws") that give the arithmetic, geometric, 
harmonic mean or the median as MLEs is obviously parameterisation-dependent, a fact 
later noted by Keynes but omitted at this stage despite his criticism of Laplace's prin- 
ciple on the same ground. The derivation of the densities f{x, 9) of the distributions is 
based on the condition that the likelihood equation 

" f) 

^-log/(y„0) = O 

is satisfied for one of the four empirical averages, using differential calculus despite 
the fact that Keynes earlier derived (on page 194) Bayes' theorem by assuming the 
parameter space to be discretej^ Under regularity assumptions, in the case of the 
arithmetic mean, this leads to the family of distributions 

fix, 6) = exp {c^'{e){x -9)- m + ^{x)) , 



^Both "Fisher" and "estimation" entries are missing from tlie index of tlie Treatise. 
^Keynes notes tliat "differentiation assumes that the possible values ofy [meaning 9 in our notations] 
are so numerous and so uniformly distributed that we may regard them as continuous" (page 196). 
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where and ip are arbitrary functions such that (j) is twice differentiable and f{x,9) is 
a density in x, meaning that 

= log / exp{(j)'{9){x~e)+ij{x)} dx, 



a constraint missed by Keynes. (The same argument is reproduced in Jeffreys, 1939, 
page 167.) 

While we cannot judge of the level of novelty in Keynes' derivation with respect to 
earlier works, this derivation interestingly produces a generic form of unidimensional ex- 



ponential family, twenty-five years before their rederivation by Darmois (1935), Pitman 



( 1936 ) and Koopman ( 1936 ) as characterising distributions with sufficient statistics of 
constant dimensions. The derivation of distributions for which the geometric and the 
harmonic means are MLEs then follows by a change of variables, y = log a;, A = \og9 
and y = 1/x, X — 1/9, respectively. In those different derivations, the normalisation 
issue is treated quite off-handedly by Keynes, witness the function 

fix,9)=A(^^'\-'o 

at the bottom of page 198, which is not integrable per se. Similarly, the derivation of the 
log- normal density on page 199 is missing the Jacobian factor 1/x (or 1/yq in Keynes' 
notations) and the same problem arises for the inverse-normal density, which should be 



/(x,0) = Ae-'='(--^)'/«'^'4 



instead of Aexpk^{9 — x)^ /x (page 200). At last, the derivation of the distributions pro- 
ducing the median as MLE is rather dubious because it does not seem to account for the 
non-differentiability of the absolute distance in every point of the sample. Furthermore, 
Keynes' general solution 

/(x,0) = Aexp|y j^^^rWdX + yj{x] 

where the integral is interpreted as an anti-derivative, is such that the recovery of 
Laplace's distribution, f{x,9) cx exp —k^\x — 9\ involves setting (page 201) 

" 1^ — ^fc^a;, 

hence making ip dependent on 9 as well. In his summary (pages 204-205), Keynes 
(a) reintroduces a constant A for the normalisation of the density in the case of the 
arithmetic mean and (b) produces 



fix, 9) = A exp (f>'{9) r + i^ix] 

in the case of the median. This later form is equally puzzling because the ratio in the 
exponential is equal to the sign of x — 9, leading to a possibly different weighting of 
expip{x) when x < 9 and when x > 9. 
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8 A reactionary proposal 



After an extensive criticism of the methods of the time and of the use of mathematical 
models as a basis for statistical inference (see Section |4]), Keynes concludes the Treatise 
with a defence of the method advocated by the late Lexis (who died in 1914), at the 
very moment Fisher ( 1925 1 was defining statistics as "mathematics applied to data" . 



As analysed by Aldrich (2008a Section 5), the defence is paired with Keynes' attempt 
to link Lexis' theory to his own principles of analogy in induction, as advanced in Part 
III of the Treatise. The following quote indicates why the attempt failed. 



"J have experienced exceptional difEculty, as the reader may discover for 
himself in the following pages, both in clearing up my own mind about it 
and in expounding my conclusions precisely and intelligently." A Treatise 
on Probability, page 409. 



When considered from a modern perspective, Chap- 
ters XXXII and XXXIII advocate a very empirical ap- 
proach to statistics (which, in an anachronistic way, pre- 
figures bootstrap), namely to derive the stability of a 
probability estimate by subdividing a series into a large 
enough number of subs Eries in order to assess the vari- 
ability of the estimate or to spot heterogeneity. Keynes 
associates this approach with Lexis and appears quite 
supportive of the latter, even though he comments that 
"Lexis has not pushed his analysis far enough" (page 
401), before complaining about von Bortkiewicz, "pre- 
ferring algebra to earth" (page 404). As highlighted by 
the above quote, the Treatise faces difficulties in build- 
ing a general theory around this approach and the description of the mechanism for 
dividing the series remains unclear throughout the chapters, since it seems to depend 
on covariates. For instance, the sentence "all conceivable resolutions into partial groups" 
(page 395) is to be opposed to breaking "statistical material into groups by date, place, 
and any other characteristic which our generalisation proposes to treat as irrelevant" 
(page 397). The model thus constructed has a mixture flavour when the groups are made 
per chance, or a hierarchical one otherwise. Indeed, the description of "the probability 
p for the group made up as follows" (page 395) 

P = —Pi H P2 + ■■■ 

z z 

clearly corresponds to a mixture, the z^'s being the component sizes. 

In any case, the description of Lexis' theory sums up as testing for variations between 
groups, i.e. by exposing a possible extra-binomial — called supra-normal by Keynes — 
variation. Keynes also mentions the possibility of an insufficient variation — the subnor- 
mal case — is attributed to dependence in the data, which "cannot be handled by purely 
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statistical metliods" (page 399|^ A modern accounting of Lexis' procedure for testing 
stability and of why the author "failed" (and is now largely forgotten) is given in Stigler 
(1986, Chapter 6), who adds on page 238 that Keynes, as one of the few followers of 
Lexis, missed the point that "simple urns models were insufEciently rich to support the 
needs of a modern statistical analysis". 

"Statistical induction is not really about the particular instance at all but a 
series." A Treatise on Probability, page 411. 

As the final chapter. Chapter XXXIII is Keynes' last attempt at defending his own 
views about a constructive theory of statistical inference. However, it mostly sounds like 
rephrasing Lexis' views, Keynes' main point being that one should work with "series 
of series of instances" (page 407) in order to check for the stability of the assumed 
model. The point made in the above quote is valid at face value but the attempt at 
checking that all subdivisions of a dataset show the same variability ( "until a prima 
facie case has been established for the existence of a stable probable frequency, we have 
but a flimsy basis for any statistical induction" , page 415) is doomed when pushed to its 
extreme division of the data into individual observations. Furthermore, we again stress 
that the Treatise never explicitly derives a testing methodology in the sense of Cosset 
or of Fisheip"! despite mentions made of "significant stability" (pages 408 and 415). 
When discussing Lexis' dispersion in Chapter XXXII, Keynes refers to a case when 
"the dispersion conformed approximately to the (...) normal law of error" (page 358), 
but, again, no entry is found on the contemporary Student's t tests or Pearson's tests. 
The earlier criticisms of Keynes' about the extension of an observed model to future 
occurrences also apply in this setting, a fact acknowledged by the author; "it is not 
conclusive and I must leave to others its more exact elucidation" (page 419). Besides, 
the assessment of stability is not detailed and, while it seems to be based on normal 
approximations (to the binomial), the facts that the same data is used repeatedly and 
thus that the test statistics are dependent appear to have been overlooked by Keynes. 



9 Inverse Probability 

As already discussed in footnotes [l] and |4] the foundations of statistics were not suffi- 
ciently settled at the time Keynes wrote his book to allow for a clear distinction between 
frequentist and Bayesian philosophies. The choice of prior distributions had already 
come under attacks in the books of Chrystal, Venn and Bertrand, but the alternative 
construction of a non-Bayesian setting would have to wait a few more years for Fisher's 



( 1925 1 new perspective. 



^The whole book considers handling dependent observations an impossible task, despite Markov's 
introduction of Markov chains a few years earlier. As pointed out by Aldrich (2008), the lack of 
feedback from Yule in A Treatise on Probability is apparent from the pessimistic views of the author 
about dependent series, despite the proximity of the authors in Cambridge, since Yule had already 
engaged into building a statistical analysis of tim e serie s. 

^''Ronald Fisher also reviewed Keynes' boo k in|l923[ concluding at the uselessness of Keynes' per- 
spective on statistics as described in |Aldrich| | |2008b^ . 
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"Bayes' enunciation is strictly correct and its method of arriving at it shows 
its true logical connection with more fundamental principles, whereas Laplace's 
enunciation gives it the appearance of a new principle specially introduced 
for the solution of causal problems." A Treatise on Probability, page 
175. 



When discussing the history of Bayes' theorem in Chapter XVI, Keynes considers — 
as shown by the above quote — that only Bayes got his proof right and that subsequent 
writers, first and foremost Laplace, muddled the issue (except for Markov)! While the 
author of the Treatise rightly separates the mathematical result represented by Bayes' 
theorem from its use in statistical inference, Keynes misses the fact that Laplace in- 
dependently derived Bayes' theorem from a purely mathematical perspective, before 
applying (much later) inverse probability principles in statistical problems. (Misunder- 
standing Bayes' theorem with not-yet Bayesian statistics seemed to be quite common at 
the time since, as reported in |Stigler] [1999 , Karl Pearson equates Bayes' theorem with 
Laplace's Principle of Non-Sufficient Reason covered below.) 

An interesting discussion in the Treatise revolves around the (obvious) fact that the 
prior probabilities of the different causes should be taken into account ( "the necessity 
in general of taking into account the a priori probability of the different causes", page 
178). But one argument sheds light on the difficulty Keynes had with the updating of 
probabilities, as mentioned in the paragraph about the CLT: "how do we know that 
the possibilities admissible a posteriori are still, as they were assumed to be a priori, 
equal possibilities (page 176) This section considers the specific arguments Keynes 
advanced against Bayesian principles. (We note again that the statistical practice had 
the time had both frequentist and Bayesian, i.e. sampling and posterior, arguments 



mostly mixed in its arguments, as detailed in Aldrich 2008a ) 



9.1 Against the Principle of Indifference 

"My criticism will be purely destructive and I will not attempt to indicate 
my own way out of the difficulties." A Treatise on Probability, page 42. 

The Principle of Indifference is Keynes' renaming of the Principle of Non-Sufficient 
Reason advocated by Laplace and his followers for using (possibly improper) uniform 
prior distributions. Following the above preamble, Keynes (rightly) shows the inconsis- 
tency of this approach under (a) a refinement of the available alternative (pages 42-43) 
and (b) a non- linear reparameterisation of the model (page 45), the example being the 
change from v into l/v. An extension of this argument on page 47 discusses the de- 
pendence of the uniform distribution on the dominating measure (although the book 
obviously does not dally with a measure theory not yet finalised at the time and anyway 

^^Notc the accents used in a priori and a posteriori, although there are no accent in Latin. They 
may have stemmed from the way French writers first used those terms, even though a posteriori is also 
found in Jakob Bernoulli's Ars Conjectandi... The accent has vanished by the time of IJeffreysl l|1931|. 
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beyond Keynes' reach), as illustrated by Bertrand's paradoxj^ This paradox points out 
the lack of meaning of a "random chord" of a circle without a proper probability struc- 
ture and is reanalysed in Jaynes (2003 page 386) from an objective Bayes perspective, 
where the author defends the maximum invariance principle. In the following and less 
convincing paragraphs of Chapter IV, Keynes finds defaults with basic game examples 
(including the Monty Hall problem) where again the equidistribution depends on the 
reference measure. 



"Who could suppose that the probability of a purely hypothetical event, of 
whatever complexity (■■■), and which has failed to occur on the one occasion 
on which the hypothetical conditions were fulfilled is no less than 1/3?." A 
Treatise on Probability, page 378. 

Similar arguments are advanced in Chapter XXX when debating about Laplace's 
law of succession Fj Those are standard criticisms found for instance in the earlier 



Bertrand ( 1889 ) . Namely, putting a uniform distribution on all possible alternatives is 
not coherent given that a subdivision of an alternative into further cases modifies the 
uniform prior. And, furthermore, a non-linear reparameterisation of a probability p into 
q = fails to carry uniformity from p to q. In concordance with the spirit of the time 
(Lhoste 1923^ 'Broemeling and Broemeling 2003), the debate about whether or not the 



Principle of Indifference holds makes some sense, as shown by the subsequent defence 
by Jeffreys, but it does not hold much appeal nowadays because priors are recognised 
as reference tools for handling data rather than expressions of truth or of "objective 
probabilities" . 

Chapter XXXI debates on the inversion of Bernoulli's Theorem, a notion that we 
interpret as Bayes formula applied to the Gaussian approximation to the distribution 
of an empirical frequency: on page 387 

fi'i')\h-m 

j:fi<l')\h-f{q)' 

apparently meaning 

fix\e)7r{9) 



J f{x\e)TT{e)d9 

in modern notations, is associated with the statement that "all the terms can be de- 
termined numerically by Bernoulli's Theorem". Since this representation is somehow 
based on a fiat reference prior (although the formula at the bottom of page 386 which 
seems to involve two distributions on the parameter 6 is incomprehensible), and thus 
on the Principle of Indifference, it is rejected by Keynes who cannot see a "justification 
for the assumption that all possible values of q are a priori equally likely" (page 387) . 



^■^We note however that the general setup of modern measure theory had already been given by Henri 
Lebesgue in 1903 in Annali di Mathematica. 

^•^ Although no black swan glides in, the section contains the obligatory example of the probability of 
the sun rising tomorrow that found in almost every treatise on induction since Hume | |Taleb[[2008l l. 
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9.2 Against probabilising the unknown 

"Laplace's theory requires the employment of both of two inconsistent meth- 
ods." A Treatise on Probability, page 372. 

The criticism of Bayesian (Laplacian's) techniques goes further than the rather stan- 
dard debate about the choice of the prior. For Keynes, adopting a perspective that 
unknown probabihties can be modeUed as random variables is beyond logical reasoning. 
Because an unknown probability is indeterminate, Keynes considers that "there is no 
such value" (page 373) Therefore, the Bayesian notion of setting a probability distri- 
bution over the unit interval is both illogical and impractical, since "if a probability is 
unknown, surely the probability, relative to the same evidence, of this probability has 
a given value, is also unknown" (page 373). Keynes then argues that, if the hyperprior 
probability is unknown, it should also be endowed with its own probability measure, 
inducing "an infinite regress" (page 373). 



10 Conclusion 

In conclusion, while Keynes' early interest in Probability and in Statistics is unarguable, 
A Treatise on Probability could not have made a lasting contribution to Statistics, 
even from an historical perspective, given the immense developments taking place in 
Statistics at the turn of the Century or in the neighbouring decades. The Treatise 
appears in the end as a scholarly exercise focussing on past books and lacking a vision 
of developments that would have made Keynes a statistician of his time, while the 
aggressive tone adopted towards most of the writers quoted in the book is undeserved 
when comparing the achievements of both camps. It is therefore no surprise the book 
has had no influence on the probability and statistics communities: it would make no 
sense to advise students in the field to put aside major treatises to ponder through A 
Treatise on Probability as, to adopt Fisher's (1922) harsh but still relevant words, "they 
would be turned away, some in disgust, and most in ignorance, from one of the most 
promising branches of mathematics. " 
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A more positive perspective (see, e.g., |Brady[ 2004[ | is to consider that Keynes' stance prefigures the 



theory of imprecise probabilities a la Dempster— Schafer (see, e.g., |Walley[ 1991^ , as for instance when 



he states that "many probabilities can be placed between numerical limits" (page 160), but, to us, this 
mostly shows the same confusion between (interval) estimates and true probabilities found elsewhere 
in the book. The notion of replacing (pointwise) probabilities by intervals in the short Chapter XV is 
attributed to Boole — with another barb in the footnote of page 161 — and it does not seem to be set to 
any implementable version in the statistical inference section (Part V). 
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